spikeslab: Prediction and Variable Selection Using Spike and Slab Regression
نویسندگان
چکیده
Weighted generalized ridge regression offers unique advantages in correlated highdimensional problems. Such estimators can be efficiently computed using Bayesian spike and slab models and are effective for prediction. For sparse variable selection, a generalization of the elastic net can be used in tandem with these Bayesian estimates. In this article, we describe the R-software package spikeslab for implementing this new spike and slab prediction and variable selection methodology. The expression spike and slab , originally coined by Mitchell and Beauchamp (1988), refers to a type of prior used for the regression coefficients in linear regression models (see also Lempers (1971)). In Mitchell and Beauchamp (1988), this prior assumed that the regression coefficients were mutually independent with a two-point mixture distribution made up of a uniform flat distribution (the slab) and a degenerate distribution at zero (the spike). In George and McCulloch (1993) a different prior for the regression coefficient was used. This involved a scale (variance) mixture of two normal distributions. In particular, the use of a normal prior was instrumental in facilitating efficient Gibbs sampling of the posterior. This made spike and slab variable selection computationally attractive and heavily popularized the method. As pointed out in Ishwaran and Rao (2005), normal-scale mixture priors, such as those used in George and McCulloch (1993), constitute a wide class of models termed spike and slab models. Spike and slab models were extended to the class of rescaled spike and slab models (Ishwaran and Rao, 2005). Rescaling was shown to induce a nonvanishing penalization effect, and when used in tandem with a continuous bimodal prior, confers useful model selection properties for the posterior mean of the regression coefficients (Ishwaran and Rao, 2005, 2010). Recently, Ishwaran and Rao (2010) considered the geometry of generalized ridge regression (GRR), a method introduced by Hoerl and Kennard to overcome ill-conditioned regression settings (Hoerl and Kennard, 1970a,b). This analysis showed that GRR possesses unique advantages in high-dimensional correlated settings, and that weighted GRR (WGRR) regression, a generalization of GRR, is potentially even more effective. Noting that the posterior mean of the regression coefficients from a rescaled spike and slab model is a type of WGRR estimator, they showed that this WGRR estimator, referred to as the Bayesian model averaged (BMA) estimator, when coupled with dimension reduction, yielded low testset mean-squared-error when compared to the elastic net (Zou and Hastie, 2005). Additionally, Ishwaran and Rao (2010) introduced a generalization of the elastic net, which they coined the gnet (short for generalized elastic net). The gnet is the solution to a least-squares penalization problem in which the penalization involves an overall `1-regularization parameter (used to impose sparsity) and a unique `2-regularization parameter for each variable (these latter parameters being introduced to combat multicollinearity). To calculate the gnet, a lasso-type optimization is used by fixing the `2-regularization parameters at values determined by finding the closest GRR to the BMA. Like the BMA, the gnet is highly effective for prediction. However, unlike the BMA, which is obtained by model averaging, and therefore often contains many small coefficient values, the gnet is much sparser, making it more attractive for variable selection. The gnet and BMA estimators represent attractive solutions for modern day high-dimensional data settings. These problems often involve correlated variables, in part due to the nature of the data, and in part due to an artifact of the dimensionality [see Cai and Lv (2007); Fan and Lv (2008) for a detailed discussion about high-dimensional correlation]. The BMA is attractive because it addresses the issue of correlation by drawing upon the properties of WGRR estimation, a strength of the Bayesian approach, while the gnet achieves sparse variable selection by drawing upon the principle of soft-thresholding, a powerful frequentist regularization concept. Because high-dimensional data is becoming increasingly common, it would be valuable to have user friendly software for computing the gnet and BMA estimator. With this in mind, we have developed an R package spikeslab for implementing this methodology (Ishwaran, Kogalur and Rao, 2010). The main purpose of this article is to describe this package. Because this new spike and slab approach may be unfamiliar to users in the R-community, we start by giving a brief high-level description of the algorithm [for further details readers should however consult Ishwaran and Rao (2010)]. We then highlight some of the package’s key features, illustrating its use in both lowand high-dimensional settings. The R Journal Vol. 2/2, December 2010 ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLES 69 The spike and slab algorithm The spikeslab R package implements the rescaled spike and slab algorithm described in Ishwaran and Rao (2010). This algorithm involves three key steps: 1. Filtering (dimension reduction). 2. Model Averaging (BMA). 3. Variable Selection (gnet). Step 1 filters all but the top nF variables, where n is the sample size and F > 0 is the user specified fraction. Variables are ordered on the basis of their absolute posterior mean coefficient value, where the posterior mean is calculated using Gibbs sampling applied to an approximate rescaled spike and slab posterior. Below is a toy-illustration of how filtering works [p is the total number of variables and (V(k)) p k=1 are the ordered variables]: V(1), . . . ,V([nF]) } {{ } retain these variables V([nF]+1), . . . ,V(p) } {{ } filter these variables . The value for F is set using the option bigp.smalln.factor, which by default is set to the value F = 1. The use of an approximate posterior in the filtering step is needed in high dimensions. This yields an ultra-fast Gibbs sampling procedure, with each step of the Gibbs sampler requiring O(np) operations. Thus, computational effort for the filtering step is linear in both dimensions. Step 2 fits a rescaled spike and slab model using only those variables that are not filtered in Step 1. Model fitting is implemented using a Gibbs sampler. Computational times are generally rapid as the number of variables at this point are a fraction of the original size, p. A blocking technique is used to further reduce computational times. The posterior mean of the regression coefficients, which we refer to as the BMA, is calculated and returned. This (restricted) BMA is used as an estimator for the regression coefficients. Step 3 calculates the gnet. In the optimization, the gnet’s `2-regularization parameters are fixed (these being determined from the restricted BMA obtained in Step 2) and its solution path with respect to its `1-regularization parameter is calculated using the lars R package (Hastie and Efron, 2007) [a package dependency of spikeslab]. The lars wrapper is called with type=”lar” to produce the full LAR path solution (Efron et al., 2004). The gnet is defined as the model in this path solution minimizing the AIC criterion. Note importantly, that cross-validation is not used to optimize the `1-regularization parameter. The gnet estimator is generally very stable, a property that it inherits from the BMA, and thus even simple model selection methods such as AIC work quite well in optimizing its path solution. This is different than say the elastic net (Zou and Hastie, 2005) where cross-validation is typically used to determine its regularization parameters (often this involves a double optimization over both the `1and `2-regularization parameters). This is an important feature which reduces computational times in big-p problems. Low-dimensional settings Although the spike and slab algorithm is especially adept in high-dimensional settings, it can be used effectively in classical settings as well. In these lowdimensional scenarios when p < n, the algorithm is implemented by applying only Steps 2 and 3 (i.e., Step 1 is skipped). The default setting for spikeslab, in fact, assumes a low-dimensional scenario. As illustration, we consider the benchmark diabetes data (n = 442, p = 64) used in Efron et al. (2004) and which is an example dataset included in the package. The response Y is a quantitative measure of disease progression for patients with diabetes. The data includes 10 baseline measurements for each patient, in addition to 45 interactions and 9 quadratic terms, for a total of 64 variables for each patient. The following code implements a default analysis: data(diabetesI, package = "spikeslab") set.seed(103608) obj <spikeslab(Y ~ . , diabetesI)
منابع مشابه
Orthogonalized smoothing for rescaled spike and slab models
Rescaled spike and slab models are a new Bayesian variable selection method for linear regression models. In high dimensional orthogonal settings such models have been shown to possess optimal model selection properties. We review background theory and discuss applications of rescaled spike and slab models to prediction problems involving orthogonal polynomials. We first consider global smoothi...
متن کاملSpike and Slab Variable Selection: Frequentist and Bayesian Strategies
Variable selection in the linear regression model takes many apparent faces from both frequentist and Bayesian standpoints. In this paper we introduce a variable selection method referred to as a rescaled spike and slab model. We study the importance of prior hierarchical specifications and draw connections to frequentist generalized ridge regression estimation. Specifically, we study the usefu...
متن کاملCombining a relaxed EM algorithm with Occam's razor for Bayesian variable selection in high-dimensional regression
We address the problem of Bayesian variable selection for high-dimensional linear regression. We consider a generative model that uses a spike-and-slab-like prior distribution obtained by multiplying a deterministic binary vector, which traduces the sparsity of the problem, with a random Gaussian parameter vector. The originality of the work is to consider inference through relaxing the model a...
متن کاملGeneralized spike-and-slab priors for Bayesian group feature selection using expectation propagation
We describe a Bayesian method for group feature selection in linear regression problems. The method is based on a generalized version of the standard spike-and-slab prior distribution which is often used for individual feature selection. Exact Bayesian inference under the prior considered is infeasible for typical regression problems. However, approximate inference can be carried out efficientl...
متن کاملThe Spike-and-Slab LASSO
Despite the wide adoption of spike-and-slab methodology for Bayesian variable selection, its potential for penalized likelihood estimation has largely been overlooked. In this paper, we bridge this gap by cross-fertilizing these two paradigms with the Spike-and-Slab LASSO procedure for variable selection and parameter estimation in linear regression. We introduce a new class of self-adaptive pe...
متن کامل